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1 Introduction 

In this paper, we present some results on information, complexity and entropy 
as defined below and we discuss their relations with the Kolmogorov-Sinai 
entropy which is the most important invariant of a dynamical system. These 
results have the following features and motivations: 

• we give a new computable definition of information and complexity 
which allows to give a computable characterization of the K-S entropy; 

• these definitions make sense even for a single orbit and can be measured 
by suitable data compression algorithms; hence they can be used in 
simulations and in the analysis of experimental data; 

• the asymptotic behavior of these quantities allows to compute not only 
the Kolmogorov-Sinai entropy but also other quantities which give a 
measure of the chaotic behavior of a dynamical system even in the case 
of null entropy. 

1.1 Information, complexity and entropy 

Information, complexity and entropy are words which in the mathematical 
and physical literature are used with different meanings and sometimes even 
as synonyms (e.g. the Algorithmic Information Content of a string is called 
Kolmogorov complexity; the Shannon entropy sometimes is confused with 
Shannon information etc.). For this reason, an intuitive definition of these 
notions as they will be used in this paper will be given soon. In our approach 
the notion of information is the basic one. Given a finite string s (namely a 
finite sequence of symbols taken in a given alphabet), the intuitive meaning 
of quantity of information I{s) contained in s is the following one: 

I{s) is the length of the smallest binary message from which you can 

reconstruct s. 

In his pioneering work. Shannon defined the quantity of information as 
a statistical notion using the tools of probability theory. Thus in Shannon 
framework, the quantity of information which is contained in a string depends 
on its context. For example the string 'the' contains a certain information 
when it is considered as a string coming from the English language. The same 
string 'the' contains much more Shannon information when it is considered 
as a string coming from the Italian language because it is much rarer in the 
Italian language. Roughly speaking, the Shannon information of a string is 
the absolute value of the logarithm of its probability. 



However there are measures of information which depend intrinsically on 
the string and not on its probability within a given context. We will adopt this 
point of view. An example of these measures of information is the Algorithmic 
Information Content (AIC). In order to define it, it is necessary to define the 
partial recursive functions. We limit ourselves to give an intuitive idea which 
is very close to the formal definition, so we can consider a partial recursive 
function as a computer C which takes a program p (namely a binary string) 
as an input, performs some computations and gives a string s = C{p), written 
in the given alphabet, as an output. The AIC of a string s is defined as the 
shortest binary program p which gives s as its output, namely 

Iaic{s) = min{|p| : C{p) = s} 

where \p\ is the length of the string p. From an heuristic point of view, the 
shortest program p which outputs the string s is a sort of optimal encoding 
of s. The information that is necessary to reconstruct the string is contained 
in the program. 

Another measure of the information content of a finite string can also be 
defined by a lossless data compression algorithm Z which satisfies suitable 
properties which will be discussed in Section |2]l|. We can define the infor- 
mation content of the string s as the length of the compressed string Z{s), 
namely 

Izis) = \Zis)\. 

The advantage of using a Compression Algorithm lies in the fact that, 
in this way, the information content Iz (s) turns out to be a computable 
function. For this reason we will call it Computable Information Content 
(CIC). 

If uj is an infinite string, in general, its information is infinite; however it 
is possible to define another notion: the complexity. The complexity K{uj) 
of an infinite string u is the mean information / contained in a digit of u, 
namely 

K[u;) = hmsup- 



n 



where tu" is the string obtained taking the first n elements oi u. If we equip 
the set of all infinite strings Q with a probability measure /x, then the entropy 
h^ of {Q, /i) can be defined as the expectation value of the complexity: 

h^= f K{u) dfi . (2) 

Jn 



If / (uj) = Iaic (^) or / (cj) = Iz (uj) , under suitable assumptions on Z 
and fi, h^ turns out to be equal to the Shannon entropy. Notice that, in 
this approach, the probabilistic aspect does not appear in the definition of 
information or complexity, but only in the definition of entropy. 

1.2 Dynamical systems and chaos 

Chaos, unpredictability and instability of the behavior of dynamical systems 
are strongly related to the notion of information. The Kolmogorov-Sinai en- 
tropy can be interpreted as an average measure of information that is neces- 
sary to describe a step of the evolution of a dynamical system. The traditional 
definition of the Kolmogorov-Sinai entropy is given by the methods of prob- 
abilistic information theory. It is the translation of the Shannon entropy into 
the world of dynamical systems. 

We have seen that the information content of a string can be defined either 
with probabilistic methods or using the AIC or the CIC. Similarly also the 
K-S entropy of a dynamical system can be defined in different ways. The 
probabilistic method is the usual one, the AIC method has been introduced 
by Brudno 0; the CIC method has been introduced in [|I^ and 0. So, in 
principle, it is possible to define the entropy of a single orbit of a dynamical 
system (which we will call, as sometimes it has already been done in the 
literature, complexity of the orbit). There are different ways to do this (see 
0, |T^, [0, [0 , [0). In this paper, we will introduce a method which can 
be implemented in numerical simulations. Now we will describe it briefly. 

Using the usual procedure of symbolic dynamics, given a partition a of 
the phase space of the dynamical system {X, /i, T), it is possible to associate 
a string $□, (x) to the orbit having x as initial condition. If a = {Ai, . . . ,Ai), 
then $a (x) = (so, Si, . . . ,Sk, ■ ■ ■) if and only if 

T'xeAs, Vfc. 

If we perform an experiment, the orbit of a dynamical system can be 
described only with a given degree of accuracy, described by the partition of 
the phase space X. A more accurate measurement device corresponds to a 
flner partition of X. The symbolic orbit $„ (^) is a mathematical idealization 
of these measurements. We can deflne the complexity K{x, a) of the orbit 
with initial condition x with respect to the partition a in the following way 

T^f \ r I{x,a,n) 

K[x, a) = hmsup- 

where 



n 



/(a;,a,n):=J($„(xr); (3) 



here $a (x)"" represents the first n digits of the string $„ {x) ■ Letting a to 
vary among all the computable partitions (see Section 4.1), we set 

K{x) =sup-ft'(x, a) . 

a 

The number K{x) can be considered as the average amount of information 
necessary to " describe" the orbit in the unit time when you use a sufficiently 
accurate measurement device. 

Notice that the complexity of each orbit K{x) is defined independently of 
the choice of an invariant measure. In the compact case, if yU is an invariant 
measure on X then j^ K{x) d^x equals the Kolmogorov-Sinai entropy. In 
other words, in an ergodic dynamical system, for almost all points x G X, and 
for suitable choice of a, /(x, a, n) ~ h^n. Notice that this result holds for a 
large class of Information functions / as for example the AIC and the CIC. 
Thus we have obtained an alternative way to understand of the meaning of 
the K-S entropy. 

The above construction makes sense also for a non stationary system. 
Its average over the space X is a generalization of the K-S entropy to the 
non stationary case. Moreover, the asymptotic behavior of I{x,a,n) gives 
an invariant of the dynamics which is finer than the K-S entropy and is 
particularly relevant when the K-S entropy is null. 

It is well known that the Kolmogorov-Sinai entropy is related to the 
instability of the orbits. The exact relations between the K-S entropy and 



the instability of the system is given by the Ruelle-Pesin theorem ([|J]). We 
will recall this theorem in the one-dimensional case. Suppose that the average 
rate of separation of nearby starting orbits is exponential, namely 

Ax(ra) ~ Ax(0)2^" for n < oo 

where Ax{n) denotes the distance of these two points. The number A is called 
Lyapunov exponent; if A > the system is instable and A can be considered a 
measure of its instability (or sensibility with respect to the initial conditions). 
The Ruelle-Pesin theorem implies that, under some regularity assumptions, 
A equals the K-S entropy. 

There are chaotic dynamical systems whose entropy is null: usually they 
are called weakly chaotic. Weakly chaotic dynamics arises in the study of self 
organizing systems, anomalous diffusion, long range interactions and many 
others. In such dynamical systems the amount of information necessary to 
describe n steps of an orbit is less than linear in n, thus the K-S entropy is not 
sensitive enough to distinguish the various kinds of weakly chaotic dynamics. 
Nevertheless, using the ideas we presented here, the relation between initial 



data sensitivity and information content of the orbits can be generalized to 
these cases. 

To give an example of such a generalization, let us consider a dynamical 
system ([0, 1],T) where the transition map T is constructive, and the func- 
tion I{x,a,n) is defined using the AIC in a slightly different way than in 
Section ^ (see [1^). If the speed of separation of nearby starting orbits goes 
like Ax{n) ~ Ax(0)/(x, n), then for almost all the points x G [0, 1] we have 

I{x,a,n) ~log(/(a;,ra)). 

In particular, if we have power law sensitivity ( Ax{n) ~ Ax{0)nP), the 
information content of the orbit is I{x, a, n) ~ p\og{n). If we have a stretched 
exponential sensitivity ( Ax{n) ~ Aa;(0)2'*'"'', p < 1) the information content 
of the orbits will increase with the power law: I{x, a, n) ~ n^. 

An example of stretched exponential is provided by the Manneville map 
(see Section |4.1|). The Manneville map is a discrete time dynamical system 



which was introduced by [^ as an extremely simplified model of intermittent 
turbulence in fluid dynamics. The Manneville map is defined on the unit 
interval to itself by T{x) = x + x^ (mod 1). When z > 2 the Manneville map 
is weakly chaotic and non stationary. It can be proved [[l^, [0, |T6| that for 
almost each x (with respect to the Lebesgue measure) 

lAicix, a, n) ~ n^^ . (4) 

1.3 Analysis of experimental data 

By the previous considerations, the analysis of I{x, a, n) gives useful infor- 
mation on the underlying dynamics. Since J(x, a, n) can be defined through 
the CIC, it turns out that it can be used to analyze experimental data 
using a compression algorithm which satisfies the property required by the 
theory and which is fast enough to analyze long strings of data. We have 
implemented a particular compression algorithm we called CASToRe: Com- 
pression Algorithm Sensitive To Regularity. CASToRe is a modification of 
the LZ78 algorithm. Its internal running and the heuristic motivations for 
such an algorithm are described in the Appendix (see also 0). We have used 
CASToRe on the Manneville map and we have checked that the experimental 
results agree with the theoretical one, namely with equation (|) (Section ^?1| ; 
see also [^). Then we have used it to analyze the behavior of I{x,a,n) for 
the logistic map at the chaos threshold (Section ^]2|, see also [|]). 

^a constructive map is a map that can be defined using a finite amount of information, 
see M. 



Finally, we have applied CASToRe and the CIC analysis to DNA se- 
quences (Section 1^), following the ideas of [|l), [|2|, |T^ for what concerns the 
study of the randomness of symbolic strings produced by a biological source. 
The cited authors performed some classical statistical techniques, so we hope 
that our approach will give rise both to new answers and new questions. 



2 Information content and complexity 

2.1 Information content of finite strings 

We clarify the definition of Algorithmic Information Content that was out- 
lined in the Introduction. For a more precise exposition see for example [T^ 



and 28 



In the following, we will consider a finite alphabet A, a = i^{A) is the 
cardinality of A, and the set S (A) of finite strings from A, that is S (A) = 
U^i-^"" U {0}- Finally, let A = Qa be the set of infinite strings ip = (co'j)jg 
with LOi E A for each i. 

Let 

be a partial recursive function. The intuitive idea of partial recursive function 
is given in the Introduction. For a formal definition we refer to any textbook 
of recursion theory. 

The Algorithmic Information Content Iaic'{s, C) of a string s relative to 
C is the length of the shortest string p such that C{p) = s. The string p can 
be imagined as a program given to a computing machine and the value C{p) 
is the output of the computation. We require that our computing machine 
is universal. Roughly speaking, a computing machine is called universal if 
it can simulate any other machine (again, for a precise definition see any 
book of recursion). In particular if U and U' are universal then Iaic{Sj U) < 
Iaic{s,U') + const, where the constant depends only on U and U'. This 
implies that, if U is universal, the complexity of s with respect to U depends 
only on s up to a fixed constant and then its asymptotic behavior does not 
depend on the choice of U. 

As we said in the introduction, the shortest program which gives a string 
as its output is a sort of ideal encoding of the string. The information which 
is necessary to reconstruct the string is contained in the program. 

Unfortunately this coding procedure cannot be performed by any algo- 
rithm. This is a very deep statement and, in some sense, it is equivalent to 



the Turing halting problem or to the Godel incompleteness theorem. Then 
the Algorithmic Information Content is not computable by any algorithm. 

However, suppose we have some lossless (reversible) coding procedure 
Z : S(^) -^ S({0, 1}) such that from the coded string we can reconstruct 
the original string (for example the data compression algorithms that are in 
any personal computer). Since the coded string contains all the information 
that is necessary to reconstruct the original string, we can consider the length 
of the coded string as an approximate measure of the quantity of information 
that is contained in the original string. 

Of course not all the coding procedures are equivalent and give the same 
performances, so some care is necessary in the definition of information con- 
tent. For this reason we introduce the notion of optimality of an algorithm Z, 
defined by comparing its compression ratio with a statistical measure of in- 
formation: the empirical entropy. This quantity which is related to Shannon 
entropy is defined below. 

Let s be a finite string of length n. We now define Hi{s), the Z*'* empirical 
entropy of s. We first introduce the empirical frequencies of a word in the 
string s: let us consider w E A'', a string from the alphabet A with length 
/; let s(™'i''^2) (z j\m2-mi \^Q ^Y^Q string containing the segment of s starting 
from the mi-th digit up to the m2-th digit; let 

' [^ otherwise \ — — 

The relative frequency of w (the number of occurrences of the word w 
divided by the total number of /-digit sub words) in s is then 



1 "■ - 

n — f + 1 ^^-^ 

i=0 



This can be interpreted as the "empirical" probability of w relative to the 
string s. Then the /-empirical entropy is defined by 



^Ks) = -7 5Z^(^'^)i°g^(^'^)- 



/ 

weA' 

The quantity lHi{s) is a statistical measure of the average information 
content of the /—digit long substring of s. 

The algorithm Z is coarsely optimal if its compression ratio |Z(s)|/|s| 
tends to be less than or equal to Hi;{s) for each k. 

8 



Definition 1. A compression algorithm Z is coarsely optimal if^k there is 
fk, with fk{n) = o{n), such that^s it holds 

\Z{s)\<\s\Hi,{s) + U\s\) . 

Remark 2. The universal coding algorithms LZll and LZ78 /[7^,|^ satis- 
fies Definition |I|. For the proof see jl^/ . 

However if tlie empirical entropy of the string is null (weak chaos) the 
above definition is not satisfying (see I^S])' ^° ^^ need an algorithm having 



the same asymptotic behavior of the empirical entropy. In this way even in 
the weakly chaotic case our algorithm will give a meaningful measure of the 
information. 

Definition 3. A compression algorithm Z is optimal if there is a constant 
A such that \fk there is a g^ with gk{t) = o{t) such that Vs it holds 

\Z{s)\<X\s\Hk{s)+gk{\Z{s)\). 

Definition 4. The information content of s with respect to Z is defined as 

Izis) = \Zis)\. 

It is not trivial to construct an optimal algorithm. For instance the well 
known Lempel-Ziv compression algorithms are not optimal (|^). However 
the set of optimal compression algorithms is not empty. In |^ we give an 
example of optimal compression algorithm that is similar to the the Kol- 
mogorov frequency coding algorithm which is used also in |^ . This compres- 
sion algorithm is not of practical use because of its computational complexity. 
To our knowledge the problem of finding a fast optimal compression algo- 
rithm is still open. 

2.2 Infinite strings and complexity 

Now we show how the various definitions of information content of finite 
strings can be applied to define a notion of orbit complexity for dynamical 
systems. This idea has already been exploited by Brudno (0). However our 
construction to achieve this goal is different: we use Computable Information 
Content instead of the Algorithmic Information Content (as it was done in 



T5| ) and computable partitions instead of open covers. 

This modifications with respect to the Brudno's definition have the ad- 
vantage to give a quantity which is the limit of computable function and 
hence it can be used in numerical experiments. 

9 



The relations we can prove between these notions and the entropy will 
be useful as a theoretical support for the interpretation of the experimental 
and numerical results. As in Brudno's approach, we will obtain that in the 
ergodic case the orbit complexity has a.e. the same value as the entropy. The 
reader will notice that the results which we will explain in this section are 
meaningful in the positive entropy case. The null entropy cases are harder to 
deal with, and they present many aspects which are not yet well understood. 
There are some results based on the AIC |T6|, but there are not theoretical 



results based on the CIC which, in general, are mathematically more involved. 
On the other hand, our definitions based on CIC make sense and the relative 
quantities can be computed numerically; in section H, we will present some 
facts based on numerical experiments. 

First, let us consider a symbolic dynamical system (a dynamical system 
on a set of infinite strings). A symbolic dynamical system is given by {Q, fi, a), 
where Q = A^, that is cj G fi implies that u is an infinite sequence (co'j)jgN 
of symbols in A, cr is the shift map 



0"((^j)ieN) = i^i 



-iJieN 



and /i is an invariant probability measure on Q. A symbolic dynamical system 
can be also viewed as an information source. For the purposes of this paper 
the two notions can be considered equivalent. 

The complexity of an infinite string u is the average (over all the string 
a;) of the quantity of information which is contained in a single digit of u (cfr 
(|T|)). The quantity of information of a string can be defined in different ways; 
by Statistics (empirical entropy), by Computer Science (Algorithmic Infor- 
mation Content) or by compression algorithms. This will give three measures 
of the complexity of infinite strings; each of them presents different kind of 
advantages depending on the problem to which it is applied. 

Let u be an infinite string in Q. Let tu" = [uji . . . Un) be the string con- 
taining the first n digits of u. 

Definition 5. If uj eVL then the algorithmic information complexity of u is 
the average information content per digit 

I^Aici^) =l^rnsup . 

If Z is a compression algorithm we also define the computable complexity of 
uj as the asymptotic compression ratio of Z 

\Z(uj^)\ 
Kz{io) =linisup . 

n— ►oo Tl 
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We also define the quantity H{uj). If uj is an infinite string, H{ijj) is a sort 
of Sfiannon entropy of tlie single string. 

Definition 6. By the definition of empirical entropy we define: 

Hi{uj) =limsup Hi{(jJ^) 

n— »oo 

and 

H{uj) =lim Hiioj). 



The existence of this limit is proved in 
The following proposition is a direct consequence of ergodicity (for the 
proof see again P 



Proposition 7. // {Q,fi,a) is ergodic H[uj) = h^{a) (where h^ is the Kol- 
mogorov-Sinai entropy) for fi-almost each oj. 

Moreover from the definition of coarse optimality it directly follows that: 
Remark 8. // Z is coarsely optimal then for each k 

Kz{uj) < Hk{u) , 
so that 

Kzioo) < H{uj) . 

Remark 9. As it is intuitive, the compression ratio of Z cannot be less than 
the average information per digit of the algorithmic information (see ^T^): 

Vs Kzis) > Kaic{s) . 

Then we have the following result. 

Theorem 10. // {Q, /i, T) is a symbolic dynamical system and fj, is ergodic, 
then for fi- almost each u 

Kz{uj) = H{uj) = Kaic{uj) = h^{T) . 

Proof. 1) We have that Kz{uj) > Kaic'{^) and then by the Brudno theo- 
rem (p) Kz{uj) > hfj,{T) for yU-almost each u. 

2) On the other hand, Kz{uj) < H{uj) and by Proposition ^ H{uj) = h^ 
for fj, almost each uj E fi. O 

11 



3 Dynamical Systems 

3.1 Information and the Kolmogorov- Sinai entropy 

Now we consider a dynamical system {X, fi,T), where X is a compact metric 
space, T is a continuous map T : X —^ X and /i is a Borel probability 
measure on X invariant for T. li a = {Ai, . . . , An} is a measurable partition 
of X (a partition of X where the sets are measurable) then we can associate 
to {X,fi,T) a symbolic dynamical system (r^c/io,, a) ( called a symbolic 
model of (X, T)). The set Q^ is a subset of {1, ... , n}^ (the space of infinite 
strings made of symbols from the alphabet {1, . . . , n}). To a point x G X it 
is associated a string u = (co'j)jgN = ^a{x) defined as 

$^(x) = ^ ^^ \/je,T^{x)eA^^ 

and ^Q, = U $Q,(x). The measure /i on X induces a measure /i^ on the 

xex 

associated symbolic dynamical system. The measure is first defined on the 
cylinders^ 

(^(a;^^'")) = {uj e Qa ■ ^i = ^i for k <i <n-l} 

by 

/i„(cM'=-"))) = /i(nrir-(A^J) 

and then extended by the classical Kolmogorov theorem to a measure Ha on 

Definition 11. We define the complexity of the orbit of a point x & X , with 
respect to the partition a, as 

KAic{x,a) = KAici^), 

Kzix,a) = Kziu)), 
where u = $Q,(a;). 

Theorem 12. If Z is coarsely optimal, {X,fi,T) is an ergodic dynamical 
system and a is a measurable partition of X, then for ^-almost all x 

Kz{x,a) = h^{T,a) 

where h^{T,a) is the metric entropy of {X, ^,T) with respect to the measur- 
able partition a. 

^We recall that w^'"'") = {uj^)k<i<n = (^fe,Wfe+i, ■ • ■ ,^n)- 

12 



Proof. To a dynamical system {X, /i, T) and a measurable partition a 
it is associated a symbolic dynamical system {Qa,fJ^a,<^) as seen before. If 
{X,fi,T) is ergodic then (r2Q,,/iQ,, cr) is ergodic and /i^(T|a) on X equals 
h^^{a) on fia (see e.g.0). Now by Theorem |10| for almost all points in Q^ 
Kziu;) = /i^^(fi„,cr). If we consider Qn^ := {u e Qa ■ Kz{uj) = h^^{Qa,cr)} 
and Q := ^~^{QnJ we have 

WxeQ Kz{x,a) = Kz{<^a{x)) = h^M = h^{T\a) . 

According to the way in which the measure fia is constructed we have fi{Q) = 
^iaiQnJ = 1. □ 

Let j3i be a family of measurable partitions such that lim diam{(3i) = 0. 

i— >oo 

If we consider lim sup Kz{x,(3i) we have the following 

i— >oo 

Lemma 13. If {X,fi,T) is compact and ergodic, then for fi- almost all points 
X E X, lim sup Kz{x,(3i) = h^{T). 

i—*oo 

Proof. The points for which Kz{x, f3i) ^ h^{T\(5i) are a set of null measure 
for each i (Theorem [12D. When excluding all these points, we exclude (for each 



i) a zero-measure set. For all the other points we have Kz{x,(3i) = h^{T\(3i) 
and then lim sup Kz{x,Pi) =limsup hfj,(T,Pi). Since the diameter of the 



partitions /?i tends to 0, we have that lim sup h^{T,(3i) = h^{T) (see e.g. jTS 
page 170), and the statement is proved. 



D 



Remarks. Theorem |T^ and the above lemma show that if a system has an 
invariant measure, its entropy can be found by averaging the complexity of 
its orbits over the invariant measure. Then, as we saw in the introduction, 
the entropy may be alternatively defined as the average orbit complexity. 
However if we fix a single point, its orbit complexity is not yet well defined 
because it depends on the choice of a partition. It is not possible to get 
rid of this dependence by taking the supremum over all partitions (as in the 
construction of Kolmogorov entropy), because this supremum goes to infinity 
for each orbit that is not eventually periodic (see @]). 

This difficulty may be overcome in two ways: 

1) by considering open covers instead of partitions. This approach was 
proposed by Brudno ||^ . Since the sets in an open cover can have non empty 
intersection, a step of the orbit of x can be contained at time in more than 
one open set of the cover. This implies that an orbit may have an infinite 

13 



family of possible symbolic codings, among which we choose the "simplest 
one" ; 

2) by considering only a particular class of partitions (that are computable 
in some sense that will be clarified later) and define the orbit complexity of 
a point as the supremum of the orbit complexity over that class. 

Brudno's open cover construction is not suitable for computational pur- 
poses because the choice of the simplest coding in a big family is not practi- 
cally feasible. 

On the other hand, the computable partition approach is the mathemat- 
ical formalization of what we do in computer simulations. We consider a 
dynamical system {X, T) and we choose a partition f3, which is always com- 
putable when it is explicitly given (the formal definition will be given in next 
section). We consider the symbolic orbit of a point x ^ X with respect to f3: 
it is a single string and we measure its information content by some suitable 
universal coding algorithm. 

3.2 Computable partitions 

In this section we will give a rigorous definition of computable partition. 
This notion is based on the idea of computable structure which relates the 
abstract notion of metric space with computer simulations. Before giving 
the formal definitions, few words are necessary to describe the intuitive idea 
about it. Many models of the real words use the notion of real numbers or 
more in general the notion of complete metric spaces. Even if you consider a 
very simple complete metric space, as, for example, the interval [0, 1] it con- 
tains a continuum of elements. This fact implies that most of these elements 
(numbers) cannot be described by any finite alphabet. Nevertheless, in gen- 
eral, the mathematics of complete metric spaces is simpler than the " discrete 
mathematics" in making models and the relative theorems. On the other 
hand the discrete mathematics allows to make computer simulations. A first 
connection between the two worlds is given by the theory of approximation. 
But this connection becomes more delicate when we need to simulate more 
sophisticated objects of continuum mathematics. For example an open cover 
or a measurable partition of [0, 1] is very hard to be simulated by computer; 
nevertheless, these notions are crucial in the definition of many quantities 
as e. g. the K-S entropy of a dynamical system or the Brudno complexity 
of an orbit. For this reason, we have introduced the notion of "computable 
structure" which is a new way to relate the world of continuous models with 
the world of computer simulations. 

To simplify notations in the following, we will denote by S the set of finite 
binary strings in a finite alphabet. S is the mathematical abstraction of the 
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world of the " computer" , or more in general is the mathematical abstraction 
of the "things" which can be expressed by any language. We suppose that 
the real objects which we want to talk about are modeled by the elements of 
a metric space {X, d). We want to interpret the objects of S as points of X. 
Thus, the first definition we need is the following one: 

Definition 14. An interpretation function on (X, d) is a function \E' : Eq ^ 
X such that \1/(S) is dense in X . (T,q C S is supposed to be a recursive set, 
namely a set for which there is an algorithm which "decides" wether a given 
element belongs to set or not). 

For example, if X = [0, 1] , a possible choice of \1' is the following one: if 

s = Si . . . s„ G S = So = S(0, 1), then 

n 

^is) = Y,s^2-\ (5) 

1=1 

A point a; G X is said to be ideal if it is the image of some string 
X = "^{s), s G S. Clearly almost every point is not ideal; however, it is 
possible to perform computations on these points since they can be approx- 
imated with arbitrary precision by ideal points, provided that the interpre- 
tation is consistent with the distance d. This consideration leads us to the 
notion of "computable interpretation": an interpretation is "computable" if 
the distance between ideal points is computable with arbitrary precision. The 
precise definition is the following: 

Definition 15. A computable interpretation function on {X,d) is a function 
^ : Sq ^ X such that ^(Sq) is dense in X and there exists a total recursive 
function D : So x SqX -^ such that Wsi, S2 ^ S, n G N 

\d{<if{s,),<if{s,))-Dis„s„n)\<^. 

If we take again X = [0, 1], we may have a different interpretation con- 
sidering a string in S as an ASCII string. In this case Sq is the set of ASCII 
strings which denote a rational number in [0,1]. In this way we obtain a differ- 
ent computable interpretation \E' : Sq ^ X which describes the same metric 
space. For most practical purposes, these two computable interpretations are 
essentially equivalent: they represent the same computable structure. A com- 
putable structure on a separable metric space (X, d) is a class of equivalent 
computable interpretations;. two interpretations are said to be equivalent if 
the distance of ideal points is computable up to arbitrary precision. 
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Definition 16. Let ^i : Si ^ X and "$2 '■ ^2 ^ X be two computable 
interpretations in {X, d); we say that \l/i and "^2 ore equivalent if there exists 
a total recursive function D* : Si x S2 x N ^ R, such that Vsi, S2 € Si x 
S2, neN; 

\d{^i{si),^2{s2)) - D*{suS2,n)\ < ^. 

Proposition 17. The relation defined by definition |T^ is an equivalence re- 
lation. 



For the proof of this proposition see [|14 . 



Definition 18. A computable structure J' on X is an equivalence class of 
computable interpretations in X . 

Many concrete metric spaces used in analysis or in geometry have a natu- 
ral choice of a computable structure. The use of computable structures allows 
to consider algorithms acting over metric spaces and to define constructive 
functions between metric spaces (see WM or W^)■ 

For example if X = R we can consider the interpretation \1/ : S — i> R 
defined in the following way: if s = Si . . . s„ G S then 

^{s) = Y^ Si2l"/']-\ (6) 

l<i<n 

This is an interpretation of a string as a binary expansion of a number. 
\1/ is a computable interpretation, the standard computable structure on R. 

Definition 19. Let X be a space with a computable structure I and \E' G X. 
A partition (3 = {-Bj} is said to be computable if there exists a recursive 
Z : S — i> such that for each s, Z{s) = j <^=^ ^(-s) ^ Bj . 

For example let us consider the following partition Pso,n = {Bsq,X — B"^} 
with 



Bl = {^is),Diso,s,n + 2) < ^} - {^(s), D{so,s,n + 2) > 1}. 

We remark that if n goes to infinity, the diameter of the set B"^^ goes to 
0. Moreover since X is compact, for each n there is a finite set of strings 
Sn = {so, ■ ■ ■ ,Sk} such that X = Uses„Bg. Since the join of a finite fam- 
ily of computable partitions is computable, there is a family of partitions 
«„ = yses„Ps,n such that a„ is for each n a finite computable partition and 
linin^oodiam^an) = 0. This will be used in the proof of the next theorem. 
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3.3 Computable complexity of an orbit 

Using the notion of computable partition, it is possible to define the com- 
plexity of a single orbit just using notion given by information theory. Since 
we have to different notions of complexity of a srting (namely Kaic{s) and 
Kz{s)), we get two different notions of complexity of an orbit: 

Definition 20. // {X, T) is a dynamical system over a m,etric space (X, d) 
with a computable structure we define the computable complexity of the orbit 
of X as: 

Kaic{x) = sup{Kaic{x, P) I P computable partition} 
Kz{x) = sup{Kz{x, (3) I P computable partition}. 

Theorem 21. If {X, fi,T) is a dynamical system on a compact space and fi 
is ergodic, then for ^-almost each x, 

Kz{x) = Kaic{x) = h^{T) 

Proof. By what it is said above we remark that for each e there is a 
computable partition with diameter less than e. Since computable partitions 



are a countable set, by lemma ^ we prove the statement. □ 

The above theorem states that Kaic{x) and Kz{x) are the right quantity 
to be considered; in fact in dynamical systems which are stationary and 
ergodic they coincide with the K-S entropy for a.e. x. However the basic 
point is that Kz{x), in principle, can be computed by a machine with an 
arbitrary accuracy. 

We call the function Kz{x) computable complexity of an orbit. For sta- 
tionary and ergodic dynamical systems, it is equivalent to the K-S entropy; 
nevertheless, it has a broader applicability and it presents the following fea- 
tures: 

• it is the limit of computable functions and it can be measured in real 
experiments; 

• it is independent of the invariant measure; 

• it makes sense even if the space X is not compact; 

• it makes sense even if the dynamics is not stationary. 
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4 Numerical Experiments 

Weakly chaotic dynamical systems give symbolic orbits with null entropy. 
For these systems the behavior of the quantity of information that is con- 
tained in n steps of the symbolic orbit is less than linear. However there is 
a big difference between a periodic dynamical system and a sporadic one 
(for example the Manneville map, Section 4.1). In fact, the latter can have 
positive topological entropy and sensitive dependence on initial conditions. 

Thus it is important to have a numerical way to detect weak chaos and 
to classify the various kind of chaotic behavior. 

We have implemented a particular compression algorithm which we called 
CASToRe (Compression Algorithm, Sensitive To Regularity). Here we present 
some experiments. First we used CASToRe in the study of the Manneville 
map. The information behavior of the Manneville map is known by the works 
of [|17|, H^, (where the AIC has been used), . We will see that our me- 



thods, implemented by a computer simulation, give results which coincide 
with the the theoretical predictions of the mentioned papers. In the second 
example, CASToRe is used to have a measure of the kind of chaotic behavior 
of the logistic map at the chaos threshold. Previous numerical works from the 
physics literature proved that the logistic map has power law data sensitivi- 
ty to initial conditions, which implies logarithmic growth of the quantity of 
information. Our numerical results surely suggest that in such a dynamical 
system the behavior of the quantity of information is below any power law, 
confirming the previous results. 

4.1 The Manneville map 



The Manneville map was introduced by Manneville in |2^ as an example 
of a discrete dissipative dynamical system with intermittency, an alternation 
between long regular phases, called laminar, and short irregular phases, called 
turbulent. This behavior has been observed in fluid dynamics experiments and 
in chemical reactions. Manneville introduced his map, defined on the interval 
I = [0, 1] by 

f (x) = X + x'' {mod I) z > I, (7) 

to have a simple model displaying this complicated behavior (see Figure p. 
His work has attracted much attention, and the dynamics of the Manneville 
map has been found in many other systems. We can find applications of the 
Manneville map in dynamical approaches to DNA sequences ([|l|,0) and ion 
channels (|^), and in non-extensive thermodynamic problems ( ||1 1| ) . 
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Figure 1: The Manneville map f for z = 2 



Gaspard and Wang (IT^l) described the behavior of the algorithmic com- 
plexity of the Manneville map, and called such a behavior sporadicity. A 
mathematical study of this kind of behavior has been done in a more general 
contest (0) and using different methods (||16|). 

Let's consider the following partition of the interval [0, 1]. Let Xq be such 
that /(xo) = 1 and Xq 7^ 1 (see Figure |l|), then we call Aq = {xq,!] and 
Ai = [0,Xo]. We can now recursively define points Xk such that f{xk) = 
Xk-i and Xk < Xk~i- Then we consider the partition a = {A^) where A^ = 
{xk,Xk^i), for k G. We denote by lAic{x,a,n) the Algorithmic Information 
Content of a n-long orbit of the Manneville map with initial condition x, 
using the partition a. We have that the mean value of lAic{x,C(,n), with 
respect to the Lebesgue measure /, on the initial conditions of the orbit is 
Ei[Iaic{x, «! IT')] ~ nP, with p = ^ for z >2, and Ei[lAicix, a, n)] ~ n for 
z <2. 

Experiments which have been performed by using the compression al- 
gorithm CASToRe (0) confirm the theoretical results and prove that the 
method related to the computable complexity are experimantally reliable. 
We considered a set of one hundred initial points, generated 10^-long orbits, 
and applied the algorithm to the associated symbolic strings s. If we consid- 
ered the compression algorithm Z =CASToRe, we have that Iz{s) is a good 
approximation of Iaic{x, a, n). 

In Table |I| we show the results. The first column is the value of the 
parameter z. The last column gives the results of the theory for the exponent 
of the asymptotic behavior of Iaic{x,<^,''T')- The second and third column 
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Figure 2: The experimental results for seven different orbits generated by the 
Manneville map with z = 4 

show the experimental results. Given the functions Iz{s), with Z =CASToRe, 
we made a mean of the obtained values for the exponents p using the Lebesgue 
measure (second column) and the invariant density given by the system (third 
column) (UTi 



z 


uniform distribution 


invariant distribution 


theoretical value 


2.8 


0.573 


0.551 


0.555 


3 


0.533 


0.509 


0.5 


3.5 


0.468 


0.449 


0.4 


4 


0.409 


0.381 


0.333 



Table 1: Theoretical and experimental results for the Manneville map 

These experiments seem also to show, as it is indicated by the theory (|0]), 
that the Algorithmic Information Content lAic{x,<y,n) of strings generated 
by the Manneville map is such that Iaic{x,(^,''T') ~ Ei[Iaic{x,(^,^)] for al- 
most any initial condition with respect to the Lebesgue measure /. In Figure 
^, we show the experimental results for the Manneville map with z = 4. On 
the left there are plotted the functions Iz{s), with Z =CASToRe, for seven 
different initial conditions, and on the right there is the mean of the func- 
tions and a right line with slope 0.33, showing the asymptotic behaviour n^. 
Notice that the functions are plotted with logarithmic coordinates, then p is 
the slope of the lines. 

4.2 The logistic map 

We studied the algorithmic complexity also for the logistic map defined by 

f{x) = Ax(l -x) , X G [0, 1] , 1 < A < 4. (8) 
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The logistic map has been used to simulate the behavior of biological 
species not in competition with other species. Later the logistic map has 
also been presented as the first example of a relatively simple map with an 
extremely rich dynamics. If we let the parameter A vary from 1 to 4, we 
find a sequence of bifurcations of different kinds. For values of A < Aqo = 
3.56994567187 . . . , the dynamics is periodic and there is a sequence of period 
doubling bifurcations which leads to the chaos threshold for A = Aoo- The 
dynamics of the logistic map at the chaos threshold has attracted much 
attention and many are the applications of theories to the logistic map at 
this particular value of the parameter A. In particular, numerical experiments 
suggested that at the chaos threshold the logistic map has null K-S entropy, 
implying that the sensitivity on initial conditions is less than exponential, and 
there is a power- law sensitivity to initial conditions. These facts have justified 
the application of generalized entropies to the map (|^,||2^). Moreover, from 



the relations between initial conditions sensitivity and information content 
(E])) "^6 expect to find that the Algorithmic Information Content of the 
logistic map at the chaos threshold is such that Iaic{s) ~ log(?2) for a n- 
digit long symbolic string s generated by one orbit of the map. We next show 
how we have experimentally found this result. 

From now on Z indicates the compression algorithm CASToRe. It is 
known that for periodic maps the behavior of the Algorithmic Information 
Content should be of order O(logn) for a n-long string, and it has been 
proved that the compression algorithm CASToRe gives for periodic strings 
Iz{s) ~ n(n) = logralog(logn), where we recall that Iz{s) is the binary 
length of the compressed string ([§). In Figure ^, we show the approxima- 
tion of Iz{s) with n(n) for a 10^-long periodic string of period 100. 

We have thus used the sequence A^ of parameters values where the pe- 
riod doubling bifurcations occur and used a "continuity" argument to obtain 
the behavior of the information function at the chaos threshold. Another se- 
quence of parameters values /i^ approximating the critic value Aqo from above 
has been used to confirm the results. 

In Figure ^, we plotted the functions S{n) = -^M- for some values of the 
two sequences. The starred functions refer to the sequence fik and the others 
to the sequence A„. The solid line show the limit function Soo{n). If we now 
consider the limit for n -^ +cxd, we conjecture that Sooin) converges to a 
constant 5*00, whose value is more or less 3.5. Then we can conclude that, 
at the chaos threshold, the Algorithmic Information Content of the logistic 
map is Iaic{s) < Iz{s) ~ Soo^{n). In particular we notice that we obtained 
an Algorithmic Information Content whose order is smaller than any power 
law, and we called this behavior mild chaos (|]§]). 
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Figure 3: The solid line is the information function for a string a of period 
100 compared with n(A^) plus a constant term 
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Figure 4: The solid line is the limit function Soo{n) and dashed lines are some 
approximating functions from above (with a star) and below. 
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5 DNA sequences 

We look at genomes as to finite symbolic sequences where the alphabet is 
the one of nucleotides {A, C, G, T} and compute the complexity Kz (where 
Z is the algorithm CASToRe) of some sequences or part of them. 

DNA sequences, in fact, can be roughly divided in different functional 
regions. First, let us analyze the structure of a Prokaryotic genome: a gene 
is not directly translated into protein, but is expressed via the production 
of a messenger RNA. It includes a sequence of nucleotides that corresponds 
exactly with the sequence of amino acids in the protein (this is the so called 
colinearity of prokaryotic genes). These parts of the genome are the coding 
regions. The regions of prokaryotic genome that are not coding are the non 
coding regions: upstream and downstream regions, if they are proceeding 
or following the gene. 

On the other hand, Eukaryotic DNA sequences have several non coding 
regions: a gene includes additional sequences that lie within the coding region, 
interrupting the sequence that represent the protein (this is why these are 
interrupted genes). The sequences of DNA comprising an interrupted gene 
are divided into two categories: 

• the exons are the sequences represented in the mature RNA and they 
are the real coding region of the gene, that starts and ends with exons; 

• the introns are the intervening sequences which are removed when the 
first transcription occurs. 

So, the non coding regions in Eukaryotic genomes are intron sequences 
and up/downstream sequences. The last two regions are usually called inter- 
genic regions. In Bacteria and Viruses genomes, coding regions have more 
extent than in Eukaryotic genomes, where non coding regions prevail. 

There is a long-standing interest in understanding the correlation struc- 
ture between bases in DNA sequences. Statistical heterogeneity has been 
investigated separately in coding and non coding regions: long-range correla- 
tions were proved to exist in intron and even more in intergenic regions, while 
exons were almost indistinguishable from random sequences ([^,ll3l)- Our 



approach can be applied to look for the non-extensivity of the Informa- 
tion content corresponding to the different regions of the sequences. 

We have used a modified version of the algorithm CASToRe, that exploits 
a window segmentation (see Appendix); let L be the length of a window. We 
measure the mean complexity of substrings with length L belonging to the 
sequence that is under analysis. Then, we obtain the Information and the 
complexity of the sequence as functions of the length L of the windows. 
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Figure 5: On the left, the complexity of coding sequences of Bacterium Es- 
cherichia Coli as a function of the length L of the windows are compared to a 
statistically equivalent random sequence. On the right, the same comparison 
with respect to the intergenic (non coding) sequence of the same genome. 
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Figure 6: The analysis of the three functional regions of the genome of Eu- 
karyote Saccharomyces Cerevisiae: the solid line is relative to the coding re- 
gions, the dashed line to the intergenic regions, the dotted line to the intron 
region. The complexities, as functions of length L of the windows, are clearly 
different from each other. The regions have not similar length so the picture 
has been drawn according to the shortest region. 
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In analogy with physical language, we call a function / extensive if the 
sum of the evaluations of / on each part of a partition of the domain equals 
the evaluation of / on the whole domain: 

iei \iei 

In case of a non-extensive function /, we have that the average on the 
different parts underestimates the evaluation on the whole domain: 

iei \iei 

Now let us consider the Information Content: each set Ai is a window of 
fixed length L in the genome, so it can be considered as Ai{L). Then the 
related complexity is ^ ^ . 

If a sequence is chaotic or random, then its Information content is exten- 
sive, because it has no memory (neither short-range or long range correla- 
tions). But if a sequence shows correlations, then the more long-ranged the 
correlations are the more non-extensive is the related Information content. 
We have that: 

• if the Information content is extensive, the complexity is constant as a 
linear function of the length L; 

• if the Information content is non-extensive, the complexity is a decreas- 
ing, less than linear function of the length L. 

From the experimental point of view, we expect our results to show that 
in coding regions the Information content is extensive, while in non coding 
regions the extensivity is lost within a certain range [0, L*] of window length 
(the number L* depends on the genome). This is also supported by the 
statistical results exposed above. 

In coding sequences, we found that the complexity is almost constant: 
the compression ratio does not change sensitively with L. In non coding 
sequences, the complexity decreases until some appropriate L* is reached 
and later the compression ratio is almost constant. 

This is an information-theoretical proof (alternative to the statistical tech- 
nique) that coding sequences are more chaotic than non coding ones. Figure 
^ shows the complexity of coding (on the left) and non coding (on the right) 
regions of the genome of Bacterium Escherichia Coli as a function of the 
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length L of the windows and compared with statistically equivalent random 
sequences. Clearly the compression ratio decreases more in the non coding 
regions than in the coding ones, where the random sequences show almost 
constant complexity. 

Figure ^ shows the analysis of the three functional regions of the genome 
of Saccharomyces Cerevisiae which is an eukaryote: the complexities, as func- 
tions of length L of the windows, are clearly different from each other. We 
remark that the lower is the compression ratio, the higher is the aggregation 
of the words in the genome. This is due to the fact that the algorithm CAS- 
ToRe recognizes patterns already read, so it is not necessary to introduce new 
words, but coupling old words to give rise to a new longer one is sufficient to 
encode the string. 



Appendix: CASToRe 

The Lempel-Ziv LZ78 coding scheme is traditionally used to codify a string 



according to a so called incremental parsing procedure [SQ]. The algorithm 
divides the sequence in words that are all different from each other and whose 
union is the dictionary. A new word is the longest word in the charstream 
that is representable as a word already belonging to the dictionary together 
with one symbol (that is the ending suffix). 

We remark that the algorithm LZ78 encodes a constant n digits long 
sequence '111 . . .' to a string with length about const + n^ bits, while the 
theoretical Information Iaic is about const + logn. So, we can not expect 
that LZ78 is able to distinguish a sequence whose Information grows like n" 
(a < |) from a constant or periodic one. 

This is the main motivation which lead us to create the new algorithm 
CASToRe. It has been proved in ||^ that the Information of a constant se- 
quence, originally with length n, is 4 + 21og(n -|- l)[log(log(n + 1)) — 1], if 
CASToRe is used. As it has been showed in section 4.1, the new algorithm 
is also a sensitive device to weak dynamics. 

CASToRe is an encoding algorithm based on an adaptive dictionary. 
Roughly speaking, this means that it translates an input stream of sym- 
bols (the file we want to compress) into an output stream of numbers, and 
that it is possible to reconstruct the input stream knowing the corrispondence 
between output and input symbols. This unique corrispondence between se- 
quences of symbols (words) and numbers is called the dictionary. 

"Adaptive" means that the dictionary depends on the file under compres- 
sion, in this case the dictionary is created while the symbols are translated. 

At the beginning of encoding procedure, the dictionary is empty. In order 
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to explain the principle of encoding, let's consider a point within the encoding 
process, when the dictionary already contains some words. 

We start analyzing the stream, looking for the longest word W in the 
dictionary matching the stream. Then we look for the longest word Y in 
the dictionary where W + Y matches the stream. Suppose that we are com- 
pressing an english text, and the stream contains "basketball ...", we may 
have the words "basket" (number 119) and "ball" (number 12) already in 
the dictionary, and they would of course match the stream. 

The output from this algorithm is a sequence of word- word pairs (W, Y), 
or better their numbers in the dictionary, in our case (119, 12). The resulting 
word "basketball" is then added to the dictionary, so each time a pair is 
output to the codestream, the string from the dictionary corresponding to 
W is extended with the word Y and the resulting string is added to the 
dictionary. 

A special case occurs if the dictionary doesn't contain even the starting 
one-character string (for example, this always happens in the first encoding 
step). In this case we output a special code word which represents the null 
symbol, followed by this character and add this character to the dictionary. 

Below there is an example of encoding, where the pair (4, 3), is composed 
from the fourth word 'AC and the third word 'G'. 

ACGACACGGAC 

word 1: (0, A) -> A 

word 2: (0, C) -> C 

word 3: (0, G) -> G 

word 4: (1, 2) -> AC 

word 5: (4, 3) -> ACG 

word 6: (3, 4) -> GAG 

This algorithm can be used in the study of correlations. 

A modified version of the program can be used for study of correlations 
in the stream: it partition the stream into fixed size segments and proceed to 
encode them separately. The algorithm takes advantage of replicated parts in 
the stream. Limiting the encoding to each window separately, could results 
in longer total encoding. The difference between the length of the whole 
stream encoded and the sum of the encoding of each window depends on 
number of correlation between symbols at distance greater than the size 
of the window. Using different window sizes it is possible to construct a 
" spectrum" of correlation. 

This has been applied for example on DNA sequences for the study of 
mid and long-term correlations (see section ^. 
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Implementation: 

The main problem implementing this algorithm is building a structure 
which allows to efficently search words in the dictionary. To this purpose, the 
dictionary is stored in a treelike structure where each node is a word X = (W, 
Y), its parent node being W, and storing a link to the string representing 
Y. Using this method, in order to find the longest word in the dictionary 
maching exactly a string, you need only to follow a branching path from the 
root of the dictionary tree. 
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